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SUMMARY 

The nucleotide sequence of the Bacillus anthracis lethal factor (LF) gene (lef) has been determined, LF is 
part of the tripartite protein exotoxin of ^. anthracis albng with protective antigen (PA) and edema factor (EF). 
The apparent ATG start codon, which is located imipediately upstream from codons which specify the first 
16 amino acids (aa) of the mature secreted LF, is precjeded by an AAAGGAG sequence, which is its probable 
hbosome-binding site. This ATG codon begins a conjtinuous 2427-bp open reading frame which encodes the 
g09-aa LF-precursor protein with an of 93798. IThe mature secreted protein (776 aa; 90237) was 
preceded by a 33-aa signal peptide which has characteristics in common with leader peptides for other secreted 

O proteins of the Bacillus species. The codon usage of Ijhe LF gene reflects its high (70%) A + T content. The 
N-terminus of LF (first 300 aa) shared extensive homology with the N-terminus of the anthrax EF protein. 
Since LF and EF each bind PA at the same site, thes^ homologous regions probably represent their common 
PA-binding domains. 



INTRODUCTION 

Bacillus anthracis, which causes anthrax, infects 
many mammalian species, including humans. The 
virulence of anthrax bacilli is due to the production 
of a poly-D-glutamic acid capsule and a three-com- 
ponent protein exotoxin (Leppla et al., 1985). The 
toxin proteins have been purified and consist of PA, 
EF and LF (BeaU etal., 1982; Ezzell et al., 1984; 

<^imspondence to: Dr. D.L. Robertson, 659 WIDB, Brigham 

YoungUniversity,Provo,UT84602(U.S.A.)Tel. (801)378-7018; 
Fai. 801-378-5474. 

Abbreviations; aa, amino acid(s); B.. Bacillus; bp, base pair(s); 
gene coding for EF; EF, edema factor (adenylate cyclase); 



Leppla et al., 1985). The lethal toxin, which contains 
PA and LF, causes death in rats, guinea pigs and 
mice (Beall et al., 1982; Little and Knudson, 1986). 
The edema toxin, which is composed of PA and EF, 
produces a localized edema in the skin of guinea pigs 
and rabbits (Stanley etal., 1960; Thome etal., 
1960). Each of the anthrax toxin genes has been 
cloned (Robertson and Leppla, 1986; Tippetts and 
Robertson, 1988; Vodkin and Leppla, 1983; Mock 

kb, kilobase(s) or 1000 bp; lef, gene coding for LF; LF, lethal 
factor; nt, nucleotide(s); oligo, ottgodeoxyribonucleotide; ORF, 
opetJ reading frame; pag, gene coding for PA; PA, protective 
antigen; RBS, r.b.s., ribosome binding site(s); ss, single 
strand(ed); tsp, transcription start point(s); USAMRIID, United 
States Army Medical Research Institute of Infectious Diseases 
(Frederick, MD). 
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et a]., 1988) and the nt sequences for the PA (pag) 
and EF (cya) genes have recently been reported 
(Welkos et a!., 1988; Robertson etal., 1988). 

EF is a calmodulin-dependent adenylate cyclase 
(Leppla, 1982, 1984; Leppla et al., 1985) which may 
be involved in the formation of the edematous lesions 
in cutaneous anthrax. EF shares extensive aa and nt 
sequence homology with the Bordetella pertussis cal- 
modulin-depwndent adenylate cyclase (Robertson 
et al., 1988), but is not related to the Escherichia coli 
or yeast adenylate cyclases. The lethal effects of LF 
are well documented, but the mode of action is not 
known, although cell disruption apparently occurs 
(Friedlander, 1986). PA apparently has no en- 
zymatic activity, but is required for toxin uptake. 

In order for the toxin proteins to enter a mam- 
malian cell, PA must first bind to a cellular receptor, 
the identity of which has not yet been determined. 
Once bound, PA is activated by proteolytic cleavage 
which results in the release of a 20-kDa N-terminal 
polypeptide. After cleavage and removal of this poly- 
peptide, a domain(s) which binds LF and EF is 
exposed. The binding of LF or EF to the modified 
PA is competitive, suggesting that these proteins 
bind PA at the same site (S.H. Leppla, personal 
communication). After LF or EF is bound to PA, the 
entire toxin complex apparently enters the cell by 
endocylosis (Leppla, 1984; Leppla etal., 1985; 
Friedlander, 1986; S.H. Leppla and AM, 
Friedlander, personal communications). Friedlander 
(1986) has shown that for LF, at least, an acid- 
dependent step is utilized. Since LF and EF appear 
to compete with each other for binding to PA, it is 
anticipated that LF and EF should share at least 
some aa sequence homology. 

The goal of our studies on the anthrax toxin genes 
is to use our recombinant clones to develop a more 
effective human anthrax vaccine. We should also be 
able to better understand the role of EF and LF in 
the pathogenesis of B. anthracis. In addition, our 
studies should help elucidate how EF and LF inter- 
act with PA and how these toxins enter the cell. In 
this communication, we describe the complete nt 
sequence and the deduced aa sequence for the LF 
gene {lef). We also show that LF and EF share a 
highly conserved N-terminus, which is probably re- 
quired to bind PA prior to cellular uptake. 



MATERIALS ANp METHODS 

(a) Reagents aiid enzymes 

DNA restricjlion and modifying enzymes were 
obtained from Bethesda Research Laboratories 
(Gaithersburg, MD). For sequencing deoxyribo- 
nucleoside anq dideoxyribonucleoside triphos- 
phates, as well as the modified T7 DNA polymerase 
(Sequenase), w<;re purchased from U.S. Biochemi- ' 
cal Corp. (Clej/eland, OH). Deoxyribonucleoside 
[o£-^^P]triphosphates (800 Curies/mmole) were 
obtained from New England Nuclear (Boston, MA), 

(b) Nucleotide Sequence analysis 
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Fig. 1. Recombinant plasmid containing the hf gene and se- 
qumcing strategy. (A) Restriction map of pLF7I. DNA which is 
frorapXOI (Kaspar and Robertson, 1987) is shown as the heavy 
solid line, and from pTZlSR (Mead et al., 1986) as a thin hne. 
I'he arrow inside the circle depicts the starting point for LF 
iranslation and shows the direction for !ef gene transcription. 
Several restriction sites are also shown. (B) Sequencing strategy 
of the B. anihracis lef gene. The nt sequence was determined 
using the Sanger etal. (1977) dideoxynucleolide termination 
lechniques. Each of the short arrows represents independent nt 
sequence deteminations using different synthetic oligos. Some 
of the restriction sites deduced from the nt sequence are also 
shown and have been confirmed by restriction enzyme digestion. 

and Leppla, 1986) isolated from the 176-kb toxin 
plasmid pXOl (Kaspar and Robertson, 1987). 
Using a synthetic oligo specific for the N-terminus of 
LF (Robertson and Leppla, 1986), we identified 

(using primer elongation) the start of the lef gene 
about 600 bp upstream from the EcoRl site with 
iranscription proceeding away from the BamUl rec- 
ognition site (see Fig. lA; unpublished data of 
authors). Our initial nt sequencing reaction showed 
lhat the nt sequence downstream from the binding 
site of this LF-specific oligo contained the codons for 
*e first 16 aa of LF (Fig. 2). completely matching 
tfie aa sequence at the LF N-terminus (J. Schmidt, 
^SAMRIID, personal communication). 



We sequenced almost 3300 bp, including the 5'- 
arid 3 '-noncoding flanking regions of the lef gene. 

IB shows the sequencing strategy for determi- 
nation of the lef gene sequence using the dideoxynu- 
cleotide termination technique described by Sanger 
et il. (1977), Synthetic oligos were used to prime 
DI^fA synthesis along the DNA in both directions 
(see MATERIALS AND METHODS, section b). 



(b) 



Translation and transcription regulatory regions 



-igure 2 shows the complete nt sequence of the lef 
structural gene with its flanking regions. There was 
ngle long ORF, which encoded an 809-aa protein. 
Thb aa sequence for the first 16 aa residues of mature 
LF has been determined (J. Schmidt, personal com- 
mi nication) and is underUned in Fig. 2 (nt 580-627). 
Thi first ATG codon (nt 481) upstream from the 
codons, which specified the start of mature LF, was 
pr< ceded by its probable RES (AAAGGAG), locat- 
ed at nt positions 465-471, If this entire RBS se- 
qutincc base-paired with the ribosome, the calculated 
frc! energy for this interaction would be 
-ll.Skcal/mole (-78.7 kJ/molc) (Tinoco etal., 
19"; 3), which is similar to RBS for genes from other 
Grkm-positive bacteria (Schwarz et al., 1988). This 
nt sequence was identical to the probable RBS for 
the pag gene (Welkos et al., 1988) and close to the 
sequence for the cya gene (AAAGGAGGT) 
(Rcbertson etal., 1988). A consensus RBS for 
B. anthracis is not known, but the cya, lef and pag 
RB S sequences were complementary to the 3 '-end of 
the 16S rRNA of the Gram-positive B. subtilis 
(Moran et al., 1982; Band and Henner, 1984). Since 
LF translation probably begins at the ATG codon 
ten nt downstream from this sequence, the LF- 
pre:ursor would contain a 33-aa signal peptide, 
whi;h is then cleaved to generate mature LF in 
B. c ntiiracis. Proteolytic cleavage may not be re- 
quiied for enzyme activity, however, since LF 
isolated for E. coli, which was intracellular and 
probably not cleaved, is biochemically active 
(Rcbertson and Leppla, 1986). 

The lef gene RBS appeared to function well in 
B. cnthracis,B. jtifcrifo (unpublished data of authors) 
and even in E. coli (Robertson and Leppla, 1986), 
We have not yet identified the position or sequence 
of tither the pag, lef or cya promoters. Therefore, 
until SI mapping or primer elongation experiments 



AAATTAQGATrTCGGTTATGTTTAGTATTTTTTTAAAATAATAGTATTAAATAGTGGAATGC)VAAT<»TAAATGGGCriTAAaCA^ 

100 110 120 130 140 150 160 170 180 

AATGAAATAATCTACaUWT<WAATTTCTCCAGTmAGXTTAAACCATACCAAAAAAATCACplCTGTC^ 

190 200 210 220 230 240 250 260 270 

CyiCTAATTAACATAACCAAATTGGTAGTTATAGGTAGAAACTTATTTATTTCTATAATACCAjrGCAAAAAAGTAAATATTCTGTTCCATA 

280 290 300 310 320 330 340 350 360 

CTATTTTAGTAAATTATTTAGCAAGTAAATTTTGGTGTATAAACAAAGTTTATCTTAATATAkAAAATTACTTTACTTOTATACAGATT^ 

■370 390 390 400 410 420 430 440 450 

AAATGAAAAATTTTTTATGACAAGAAATATTGCCTTTAATTTATGAGGAAATAAGTAAAATTrTCTACATACTTTATTTTATTGTTGAAA 

460 470 480 490 500 510 520 530 540 

TGTTCACTTATAAA AAAGGAG AGATTAAATATCAATATAAAAAAAGAATTTATAAAAGTAAT rAGTATGTCATGTTTAGTAACAGCAATT 
(r.b. s. > MetAsnlleLysLysGltiPhelleLysValll iSerMetSerCysLeuValThrAlalle 

550 560 570 580 590 600 

ACTTTCAGTGGTCCCOTCTrrATCCCCCTTGTACAGGGGGCGGGCGGTCATGGTGATGTAGG'7 
ThrLauSerClvProValPhelleProLeuValGlnGlv AlaGlvGlYHisGlYAs 
+1 of mature LF 



1000 1010 1020 1030 1040 1050 

GTA6AAAATACTGAAAAGGCACTGAACaTTTATTAn»AATAGGTMGATATTATCAAaGGAa|ATTTTAAGTAAAATTAATCJUlC^^^ 
ValGluABnThrGluLysAlaLeuAsnValTyrTyrGluIleGlyLyalleLeuSerArg/ " " ' - " 

1110 1120 1130 1140 



. JAGATGTATTAAATACCATTAAAAATGCATCTGATTCAGATGGACAAGATCTT rTATTTACTAATCAGCTTAAGGAACAT 
iLeuAspValLeuAsnThrIleLysABnAl«SerAspSerAspGlyGlnAspLeuLeuPheThrA3nGlnLeuI.y5GluHl5 



1090 
CAGAAATTTTTJ 
GlnLysPhel 

1180 1190 1200 

CCCACAGACTTTTCTGTAGAATTCTTGGAACAAAATAGCAATGAGGTACAAGAAGTATTTGCG^GC 
ProThrAspPheSerValGluPheLeuGluGlnAsnSerAsnGluValGlnGluValPheAli 



1060 1070 loeo 

'TTAAGTAAAATTAATCAACCATAT 
rArgAsplleLeuSerLysIleAsnGlnProTyr 

1150 1160 1170 



1250 1260 
(CTTTTGCATATTATATCGAGCCA 
itysAlaPheAlaXyrTyrllBGluPro 



1330 1340 1350 

lAACAAGAAATAAATCTATCCTTGCAA 
iluGlnGluIleAsnLeuSerLeuGlu 



1270 1280 1290 1300 1310 1320 

CAGCATCGTGATGTTTTACAGCTTTATGCACCGGAAGCTTTTAATTACATGGATAAATTTAAC 
GlnHisArgAspValLeuGlnLeuTyrAlaProGluAlaPheAsnTyrMetAspLysPheAsn 

1360 1370 1380 1390 1400 

GAACTTAAAGATCAACGGATGCTGTCAAGATATGAAAAATGGGAAAAGATAAAACAGCACTAT :AACACTGGAGCGATTCTTTATCTGAA 
GluLauLysAspGlnArgMetLeiiSerArgTyrGluLysTrpGluLysIleLysGlnHisTyr JlnHisTrpSerAspSerLeuSerGlu 

1450 1460 1470 1480 1490 1500 

GAAGGAAGAGGACTTTTAAAAAAGCTGCAGATTCCTATTGAGCCAAAGAAAGATGACATAATTiTlTTCTTTATCTCAAGAAGAAAAAG 
CluGlyArgGlyLeuLeuLysLysLeuGlnllsProIleGluProLysLysAspiAspIlellellllsSerLeuSerGlnGluGluLysGlu 

1540 1550 1560 1570 1580 1590 

cttctaaaaagaatacaaattgatagtagwattttttatctactgacgaaaaagagtttttaAaaaagctacaaattgat^ 
LeuLeuLysAiglleGlnlleAspSerSerAspPheLeuSsrThrGluGluLysGluPheLeuiysLysLeuGlnlleAspIleArgAsp 

1630 1640 1650 1660 1670 1680 i 1690 1700 1710 

TCTTTATCTGAAGAAGAAAAAGAGCTTTTAAATAGAATACAGGTGGATAGTAGTAATCCTTTAicTGAAAAAGAAAAAGAGTTTTTAAAA 
SerLeuSerGluGluGluLysGluLeuLeuAsnArgIleGlnValAspSerSerAsnProl«userGluLysGluI.ysGluPheLeuLys 

1720 1730 1740 1750 1760 1770 ! 1780 1790 IBOO 

AAGCTGAAACTTGATATTCAACCATATGATATTAATCAAAGGTTGCAAGATACAGGAGGGTTAiTTGATAGTCCGTCAATTAATCTTGAT 
LysLeuLyisLeuAspIleGlnProTyrAspIleAsnGlnArgLeuGlnAspThrGlyGlyLeuileAspSerProSerlleAsnLeuAsp 

1810 1820 1830 1840 1850 1B60 1870 1880 1890 

GTAAGAAAGCAGTATAAAAGGGATATTCAAAATATTGATGCFTTATTACATCAATCCATTGGAAGTACCTTGTACAATAAAATTTArTTG 
ValArgLysGlnTyrLysArgAspIleGlnAsnlleAspAlaLfiuLeuHisGlnSerlleGly^erThrLeuTyrAsnLysIleTyrLeu 

1900 1910 1920 1930 1940 1950 1960 1970 1980 

TATGAAAATATGAATATCAATAACCTTACAGCAACCCTAGGTGCGGATTTAGTTGATTCCACTOATAATACTAAAATTAATAGAGGTATT 
Ty rGluAsnMe tAsnl 1 sAsnAsnLeuThr Al aThrLeuGXyAl aAspLeuVal AspSerThrjispAsnThrLysI leAsnArgGly II e 
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2080 2090 2100 2110 2120 2130 2140 2150 2160 

CGTTTGAAATGGAGAATCCAATTATCACCAGATACa"CGAGCAc;GA.TATTlrAGAAAATGGAAAGCTTATATTACAAAGAAACATCGGTCTG 
ArgLeuLysTrpArgrieGlnLeuSerProAspThrArgAlaGlyTYrLjeuGluAsnGlyLysLeuIleLfiuGlnArgAsnlleGlyLeu 

2170 2180 2190 2200 221|0 2220 2230 2240 2250 

GAAATAAAGGATGTACAAATAATTAAGCJUITCC0AAAAAGAATATATAA5GATTGATGCGAAAGTAGTGCCAAAGAGTAAAATAGATACA 
GluIleLysAspValGlnllelleLysGlnSerGluLyBGluTyrlleArglleAspAlaLysValValProLysSerLysIleAspThr 



2260 2270 22B0 2290 230P 

AAAATTCAAGAAGCACAGTTAAATATAAATCAGCAATGGAATAAAGC*T|rAGGGTTACCAAAATATACAAAGCTTATTACATTCAACGTG 
LysIleGlnOluAlaGlnLeuAsnllaAsnGlnGluTrpAsnLysAlaLsuGlyLeuPrt 

2380 
:TTATr 

HisAsnArgTyrAlaSerAsnlleValGluSerAlaTyrLeuIli 



CATAATAGATATGCATCCAATATTGTAGAAAGTGCTTATTTAATATTGAXTGAATGGAAAAATAATATTCAAAGTGATCTTATAAAAAAG 



2330 2340 
iTTACATTCAACGTG 
>Ly sTy rThrLy 8 LeulleThrPheAsnVal 

2400 

;gaaaaj 

LeuA^nGluTrpLysAsnAsnlleGlnSerAspLeuIleLysLys 



2450 



248|} 



2490 



2500 



2510 



GTAACAAATTACTTAGTTGATGGTAATGGAAGATTTGTTTTTACCGATAt'TACTCTCCCTAATATAGCTGAACAATATACACATCAAGAT 



ValThrAsnTyrLeuValAspGlyAsnGlyArgPhe'. 



>I LeThrLeuProAsnlleAlaGluGlnTyrThrHisGlnAsp 



AGGAATGATAGTGAGGGTTTTATACACGAATTTGGACAlXKn'GTGGATCVTTATCCTGGAmCTATTAGATAAGAACCAAT«X;ATTTA 



ArgAsnAspSerGluGlyPhelleHisGluPheGlyHlsAlaValAspA 



ipTyrAlaGlyTyrLeuLeuAspLysAanGlnSerAspLeu 



2710 2720 2730 2740 275b 

GTTACAAATTCTAAAAAATTCATTGATATTTTTAAGGAAGAAGGGAGTA^TTTAACTTCGTATGGGAGAACAAATGAAGCGGAATTTTTT 
valThrAsnSerLysLysPhellftAspllePheLysGluGluGlySerA [nLeuThrSerTyrGlyArgThrAsnGluAlaGluPhePhe 

2800 2810 2820 2830 

GCAGAAGCCmAGGTTAATGCATTCTAajGACCATGCTGAACGTTTAA.LAGTTCAAAAAAATGCTCCGAAAACTTTCCAATTTATTAAC 
AlaGluAlaPheArgLeuKetHlsSerThrAspHisAlaGluArgLeuLysValGlnLysAanAlaProLysThrPheGlnPhalleAsn 



«y*.v £:rjw iVHij 4yo\j ^s/u 

GATCAGATTAAGTTCATTATTAACTCATAAGTAATGTATTAAAAATTTTctAAATGGATTTAATAATAATAATAATAATAATAATAACGGa 
AapGlnlleLysPherielleAsnSer 



lTATTATTCTATCAAGTGGCTGTATATTTTGTGTAATT 



Fig. 2. Nucleotide and deduced aa sequence for the lef structural gene with its 5'- and 3 -noncoding flaukLng regions. The presumptive 
RBS (r.b.s.) for lef and the prob^le start codon ate shown. The 33-aa signal peptide which starts the LF ORF at nt 481, as well as 
ihe first aa ( + 1, Ala) of mature LF (nt 580) are also shown. The first 16 a i of mature LF, as determined by J. Schmidt (LTSAMRIID), 
d (nt 580-627). 




are performed, the transcription start sites for these 
iinthrax toxin genes will remain unknown. 

(c) Base composition and codon usage 

The base composition of the coding strand of the 
' ¥ structural gene was: A = 41%, T = 29% 
{A + T = 70% of total), G = 18%, C = 12% 
{G + C = 30% of total). The 70% AT base compo- 
sition for the coding strand is slightly higher than the 
overall AT-base composition for pXOl and genomic 



DUA, which are about 69%, (Kaspar and Robertson, 
19 !7). The 5 '-noncoding region immediately up- 
str jam from the lef gene has a higher A + T content 
Oi %). which seems to be characteristic of the regu- 
late )ry regions for genes of bacilli and related bacteria 
(Moran et al., 1982), For example, the pa^ structural 
get e has an A + T content of 67%„ but contains a 
higier (75%,) A + T base composition for its 
up! tream regulatory sequences (Welkos et al., 1988). 

The codon usage for the entire LF-precursor pro- 
tein is shown in Table I. There is a preference for 



codons which contain an A or T in the third position, 
which likely reflects the high A + T content of the 
gene. For example, codons for aa which have six 
codons (e.g., Leu, Ser and Arg), use the triplet com- 
binations which have the higher A + T contents. 
Similar codon usage was observed for the pag and 
cya genes (Robertson et al., 1988; Wellcos et al. 
1988). Overall codon usage for B. anthracis is not 
known, but Shields and Sharp (1987) showed that 
highly expressed genes from several different uni-. 
cellular organisms use codons for which the most 
abundant tRNAs are available. 

(d) Amino acid sequence of the LF protein 

Figure 2 also includes the deduced aa sequence for 
the full-length LF-precursor (809 aa) with an of 
93 798. Since the aa sequence of mature LF actually 

begins at aa position 34 of the LF-precursor (at the 
Ala residue marked hi; see RESULTS AND DIS- 
CUSSION, section b), the 33-aa leader peptide 
preceding this position must be removed during se- 
cretion. This signal peptide conforms to known 
Bacillus leader sequences in that it started with 

TABLE I 

Codon utilization in the lef gene of Bacillus anthracis 




lly positive) and hydrophilic residues 
'ed by a central core of hydrophobic 
10-22) and then several hydrophilic 
33) prior to the start of the mature 
lytic cleavage apparently occurs at a 
le bond consistent with signal pro- 
or Gly in Bacillus spp. (Pugsiey 
1985; MacKay et al., 1986; O'Neill 
ig. 3 shows the aa sequences near the ■ 
L PA and LF signal peptides. Similar ' 



Fig. 3. Signal peptide 
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:s relative 



analysis. The last three aa at (he C-ter- 
EF, PA and LF signal peptides are shown, 
aa at the C-terminus of the R. subiiUt 
The numbers indicate the location of as 
the deduced signal peptide cleavage site (t). 
-1,-2 and -3) indicate their upstream posi- 
positive number ( + 1) indicates the down- 
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2 


CTG" 


Leu 


5 


CCG" 


Pro 


3 


CAG" 


Gin 


13 


CGG 


Arg 


1 


ATT 


He 




ACT" 


Thr 


10 


AAT 




49 


AGT 


Ser 


16 


ATC'' 


lie 


7 


ACC" 


Thr 


4 


AAC" 




8 


AGC 


Ser 


2 


ATA 


He 


26 


ACA 


Thr 


13 


AAA" 


Lys 


63 


AGA 


Arg 


12 


ATG 


Met 


to 


ACG 


Thr 




AAG 


Lys 


23 


AGO 


Arg 


7 


GTT" 


Val 


12 


GCT' 


Ata 


11 


GAT 


Asp 


51 


GGT" 


Gly 


14 


CTC 


Val 


1 


GCC 


Ala 




OAC 




4 


GGC 


Gly 




GTA" 


Val 


22 


OCA" 


Ala 


12 


GAA" 


Glu 


59 


GGA 


Gly 


14 


GTG 


Val 


5 


GCG* 


Ala 


5 


GAG 


Glu 


20 


GGG 


Gly 


i 



' Number of codons in the entire cya coding region (se 
" Major E. coti iRNA species (Ikemura, 1981). 
' Slop codons. 



51 



peptK 
quence 
(Wang 



sequences are present for other Bacillus signal 
ides (Pugsley and Schwartz, 1985). The aa se- 



for the end of the subtiHsin signal peptide 
et ai., 1988) is also included in Fig. 3. Similar 



aa residues at the end of these Bacillus signal peptides 
ji-e probably required for signal peptidase recog- 
nition and cleav^e. 

Mature LF, starting with aa position 34 (corre- 
sponding to nt 580), has an A/, of 90237 which is 
larger than the previously reported value of about 
83 kDa(Leppfa et al., 1985). However, recent sizing 
of LF by Quinn et al. (1988) showed that LF has an 
electrophoretic mobility slightly slower than, or at 
least very close to, that for EF, which is about 
89 kDa (Leppla et al., 1985 ; Robertson et al., 1988). 
We conclude, therefore, that LF should have an 
slightly larger than EF, consistent with our deduced 
aa sequence and size calculations. The deduced aa 
content for LF is also close to the experimentally 
derived values determined from an acid hydrolysis of 
LF (J. Schmidt, USAMRIID, personal communica- 
tion). It is also interesting that none of the mature 
anthrax toxin proteins contains Cys, although LF 
contains a single Cys in the signal peptide, which is 
removed during secretion. Ths calculated pi (6.01) 
for LF, which is close to the experimentally deter- 
mined value of 5.8 (Quinn ct al., 1988), reflected the 
■ larger number of acidic (133 aa), compared to basic, 
I ,0 residues (110 aa) in the mature secreted protein. 

Initially, we were concerned when the deduced 
size of LF, based on its nt sequence, was larger than 
previously reported (Leppla et al., 1985). In addition, 
when we started to analyze the LF sequence, we 
observed than when the aa sequence of LF was 
compared to itself, LF possessed several internal 
repeated regions located between aa 300 and 420 
(see Fig. 4). We also observed that the lef nt se- 
quence contains nt repeats in the regions which 
correspond to the repeated aa domains (data not 
shown). It should be emphasized, however, that the 
repeated domains for LF and lef are not exact dupli- 
cations, but repeats possessing 80-90% homology. 

Therefore, in order to be certain that the DNA 
which we used for nt sequencing was not altered 
during the process of cloning, we analyzed LF- 
specific DNA isolated from pLF7 [which was pre- 
viously shown to encode functional LF (Robertson 
and Leppla, 1986)], pLF71 (Fig. lA) and pLF74 
(see MATERIALS AND METHODS, section b) (which 




300 400 SOO 

il repeated regions for the aa sequence of LF. The 
sequence was compared to itself and found to 
pjossess several duplicated regions clustered between aa 300 and 
420 (located between nt 1380 and 1740 of Fig. 2). Il was also 
observed, although the data is not shown, that a similar set of 
r speats were present in the nt sequence of lef ia the region which 
corresponds to these aa repeats. The numbers on each axis show 
tlie aa position relative to the entire aa sequence of the 
I F-precursor protein. 



\/eie used for nt sequencing), and pXOl. We 
c igested each of these DNAs with EcoRl or Hindlll 
and then blotted the electrophoretically-separated 
DNA onto nitrocellulose. After hybridization with a 
•specific probe, we observed that each analyzed 
DNA contained identically sized DNA fragments 
dontaining the repeated regions for each enzyme. 
ITiese results clearly show that our cloned DNAs, 
\t'hich produce active LF and which were used for nt 
equencing, had been faithfully propagated in E. coli 
ciid are identical to the corresponding region in 
pXOl. Consequently, we feel that the nt sequence we 
tietermined for the region of lef which contains the 
ihteraal repeats is correct and that lef contains bona 
fide repeats. The function, if any, of these repeated 
itegions in LF is not known. 



(e) Relationship of Bacillus anthracis lef and LF to 
other known genes and proteins 

There is no detectable homology between the 
B. anthracis lef gene, or its deduced aa sequence, 
with any other known gene or protein from the 
current GenBank and NBRF databases (March 
1988). However, we have observed that the N-ter- 
minus of LF is homologous to the corresponding 
N-terminal domain of EF. The regions of homology 
between these proteins are shown in Fig. 5. It is 
probable that these homologous aa domains are re- 
quired to bind PA prior to cellular uptake. We also 
determined the hydropathic profiles for the LF- and 
EF-precursor proteins (Robertson et aJ., 1988), 
which are shown in Fig. 6. If these homologous 
domains are required for binding PA, then it would 
be anticipated thai LF and EF should have similar 
hydropathic profiles. Our analysis indicated that the 
conserved domains of EF and LF, which are mostly 
hydrophilic, would probably be located on the sur- 
face where they could interact with PA. 

Since the N-termini of LF and EF probably bind 
PA, it is presumed that the catalytic domains of LF 
and EF must reside within their respective C- 
terminal regions. For example, we have recently 
shown that the C-terminus of EF (aa 300-800) 
contains its ATP-binding site and is the region which 
is homologous with the B. pertussis cahtiodulin- 
dependent adenylate cyclase (Glaser etal., 1988; 
Robertson, 1988; Robertson etal., 1988). Fig. 7 
shows the important structural domains of LF and 
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Fig. 7. Structural domai 
the first 300 aa of these 
cyclase (Robertson, 1981 
probably occupies the u 




Fig. 6. Hydropathic Analysis of the C-tenninal domains of the 
(A) LF- and (B) EF-Jrecursor proteins. Using the algorithm of 
Hopp and Woods (I9t)), the proposed hydropathic values were 
determined. The hydriphilic residues are positive and the actual 
calculated numbers af multiplied by 10. (The signal peptides 
.start with po.sitive values (hydrophilic residues), and then the 
hydrophobic regions wihich have negative values.) The conserved 
regions between LF aild EF (see Fig. 5) are numbered and have 
similar hydropathic chjarac 



EF, which are lobalized in their N-termini. The 
C-terminus of EF contains its catalytic domain, but 
while it is likely tjhat the C-terminus of LF also 
contains its biochemical activity, we do not know 
what this activity inight be. 
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PKDVLEIV i 
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KSEQSKEV1(YFI9K<:lSLDII?yDKS 



PKQKFLDVLHTllOtASDSDCQDLLFTNQLKE HPTDFSVEFLEQKSMEVQEVFAKAFAIHEJ^HBDVLQ .VAPEAFNyMDKFNEflEIHLSLEELK 
LDPEFUfi-IKSlio DSD5SDLf.rS QKfKEKLEL>IHKsrDIIlFtKEKt,TEFQHAFSL^ 

4 

DQR>(LSRV£KWEKIKaHYQHVIS0SI.£ECCRGLLKKt>3IPIEPKKDDIIHSLSaEEKELLKRlalOSSDFI^^ 
KE Ovii PRIDVLKCElUkLKASGLVPEHA DAFjut lAaELM TYILFHi-VMicrATliLiKS^ 

a 
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Fig. 5- Amino acid homology comparison between LF and EF. The aa sequences for the LF- and EF-precursor proteins were compared- 
Five regions of aa homology were observed (domains 1-5). The arrows indicate the first aa of the'imature, secreted proteins. Also shown 
arc the probable calmodulin-binding site (domain a) and ATP-binding site (domain b) of EF^ both of which are conserved in the 
B. pertussis caimodulin-dependent adenylate cyclase (Robertson, 1988; Robertson et al., 1988). Short vertical lines connect identical aa; 
colons connect similar aa. 
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pjg. 7. Structural domains of LF and EF. The putative PA binding d 
the firsl '^^ these proteins. The calmodulin-dependent adenyb 

cyclase (Robertson, 1988), occupies its C-terminal 500 aa region. Th 
pr(*ably occupies the corresponding C-termihal region of LF, 



(f) Conclusions and perspectives 

We have cloned and sequenced the B. anthracis lef 
gene, which encodes LF. LF is part of the tripartite 
protein exotoxin of B. anthracis. The LF-coding gene 
has features in common with the B. anthracis PA- 
coding (pag) and EF-coding {cya) genes, including 
similar RBS and long leader-peptides, which are 
apparently cleaved during secretion (Robertson 
etal., 1988; Welkos et al., 1988). Codon usage for 
each of these toxin genes appeared to be similar, and 
none of the mature proteins contained Cys. The size 
of mature LF, deduced from its at sequence, was 
90 327 Da, which is close to the experimentally deter- 
mined values (Leppla etal., 1985; Quinn etal.. 



^ 809 aa 



nains for LF and EF (see Fig. 5) are apparently localized within 
cyclase domain of EF, which is homologous to the B. pertussis 
probable biochemically functional domain for LF, by analogy, 



malian cells could have significant experimental ap- 
plications. 
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We now know the complete nt sequences for the 
B. anthracis cya, lef and pag toxin genes. Using this 
information, we should be able to construct expres- 
sion vectors for production of these toxin proteins 
which can be used for immunological purposes as 
well as studying the biochemistry of these interesting 
toxin proteins. In addition, site-specific mutagenesis 
experiments are in progress to help define regions 
required for activity, which should lead to the pro- 
duction of a safer recombinant DNA-derived vac- 
cine. 

It is also of interest to determine the mechanism 
by which EF and LF enter the cell. As indicated 

previously, PA binds to a cell surface receptor and 
is cleaved prior to binding either EF or LF. Experi- 
ments which are designed to determine whether the 
I 300-aa N-lerminus of either LF or EF can be used 
to transport heterologous proteins into a cell using 
Pa as the transporting protein are in progress in our 
iaboratory. The availability of a specific transport 
mechanism to introduce foreign proteins into mam- 



; NOTE ADDED IN PROOF 

After this manuscript was accepted, the authors 
learned that John Lowe (USAMRIID) has also 
determined the complete lef gene sequence with an 
j ORF encoding the same 809-aa LF-precursor pro- 
Itein. J. Lowe used manual sequencing and the 
lApplied Biosystems 370 to determine his sequence. 
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